Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 236142 |
| Missing cells | 52575 |
| Missing cells (%) | 1.5% |
| Duplicate rows | 1747 |
| Duplicate rows (%) | 0.7% |
| Total size in memory | 27.0 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 7 |
| Dataset has 1747 (0.7%) duplicate rows | Duplicates |
order_date is highly overall correlated with coupon_usage | High correlation |
delivery_fee is highly overall correlated with delivery_time_in_seconds | High correlation |
delivery_time_in_seconds is highly overall correlated with delivery_fee | High correlation |
nb_menu_items is highly overall correlated with restaurant_id and 1 other fields | High correlation |
coupon_usage is highly overall correlated with order_date | High correlation |
restaurant_category is highly overall correlated with restaurant_id and 3 other fields | High correlation |
restaurant_type is highly overall correlated with restaurant_category | High correlation |
province is highly overall correlated with restaurant_category | High correlation |
restaurant_id is highly overall correlated with restaurant_category and 1 other fields | High correlation |
cooking_time_in_seconds has 25966 (11.0%) missing values | Missing |
delivery_time_in_seconds has 20804 (8.8%) missing values | Missing |
nb_menu_items has 5805 (2.5%) missing values | Missing |
cooking_time_in_seconds is highly skewed (γ1 = 92.33622733) | Skewed |
delivery_time_in_seconds is highly skewed (γ1 = 64.5824559) | Skewed |
delivery_fee has 91802 (38.9%) zeros | Zeros |
food_price has 6378 (2.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-03 10:02:27.899026 |
|---|---|
| Analysis finished | 2022-12-03 10:02:43.212883 |
| Duration | 15.31 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
order_date
Real number (ℝ)
| Distinct | 365 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 212.94699 |
| Minimum | 1 |
|---|---|
| Maximum | 365 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 122 |
| median | 228 |
| Q3 | 310 |
| 95-th percentile | 356 |
| Maximum | 365 |
| Range | 364 |
| Interquartile range (IQR) | 188 |
Descriptive statistics
| Standard deviation | 106.79004 |
|---|---|
| Coefficient of variation (CV) | 0.50148649 |
| Kurtosis | -1.1787103 |
| Mean | 212.94699 |
| Median Absolute Deviation (MAD) | 92 |
| Skewness | -0.30514585 |
| Sum | 50285729 |
| Variance | 11404.113 |
| Monotonicity | Increasing |
| Value | Count | Frequency (%) |
| 358 | 1284 | 0.5% |
| 363 | 1274 | 0.5% |
| 360 | 1269 | 0.5% |
| 359 | 1244 | 0.5% |
| 361 | 1230 | 0.5% |
| 335 | 1217 | 0.5% |
| 362 | 1215 | 0.5% |
| 356 | 1194 | 0.5% |
| 342 | 1179 | 0.5% |
| 320 | 1170 | 0.5% |
| Other values (355) | 223866 |
| Value | Count | Frequency (%) |
| 1 | 398 | |
| 2 | 465 | |
| 3 | 415 | |
| 4 | 413 | |
| 5 | 396 | |
| 6 | 439 | |
| 7 | 391 | |
| 8 | 393 | |
| 9 | 423 | |
| 10 | 371 |
| Value | Count | Frequency (%) |
| 365 | 1121 | |
| 364 | 1088 | |
| 363 | 1274 | |
| 362 | 1215 | |
| 361 | 1230 | |
| 360 | 1269 | |
| 359 | 1244 | |
| 358 | 1284 | |
| 357 | 1145 | |
| 356 | 1194 |
order_day_of_week
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| Saturday | |
|---|---|
| Sunday | |
| Friday | |
| Monday | |
| Tuesday | |
| Other values (2) |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.1193816 |
| Min length | 6 |
Characters and Unicode
| Total characters | 1681185 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Monday |
|---|---|
| 2nd row | Monday |
| 3rd row | Monday |
| 4th row | Monday |
| 5th row | Monday |
Common Values
| Value | Count | Frequency (%) |
| Saturday | 35554 | |
| Sunday | 35310 | |
| Friday | 34420 | |
| Monday | 33743 | |
| Tuesday | 32735 | |
| Thursday | 32650 | |
| Wednesday | 31730 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| saturday | 35554 | |
| sunday | 35310 | |
| friday | 34420 | |
| monday | 33743 | |
| tuesday | 32735 | |
| thursday | 32650 | |
| wednesday | 31730 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 271696 | |
| d | 267872 | |
| y | 236142 | |
| u | 136249 | |
| r | 102624 | 6.1% |
| n | 100783 | 6.0% |
| s | 97115 | 5.8% |
| e | 96195 | 5.7% |
| S | 70864 | 4.2% |
| T | 65385 | 3.9% |
| Other values (7) | 236260 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1445043 | |
| Uppercase Letter | 236142 | 14.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 271696 | |
| d | 267872 | |
| y | 236142 | |
| u | 136249 | |
| r | 102624 | 7.1% |
| n | 100783 | 7.0% |
| s | 97115 | 6.7% |
| e | 96195 | 6.7% |
| t | 35554 | 2.5% |
| i | 34420 | 2.4% |
| Other values (2) | 66393 | 4.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 70864 | |
| T | 65385 | |
| F | 34420 | |
| M | 33743 | |
| W | 31730 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1681185 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 271696 | |
| d | 267872 | |
| y | 236142 | |
| u | 136249 | |
| r | 102624 | 6.1% |
| n | 100783 | 6.0% |
| s | 97115 | 5.8% |
| e | 96195 | 5.7% |
| S | 70864 | 4.2% |
| T | 65385 | 3.9% |
| Other values (7) | 236260 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1681185 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 271696 | |
| d | 267872 | |
| y | 236142 | |
| u | 136249 | |
| r | 102624 | 6.1% |
| n | 100783 | 6.0% |
| s | 97115 | 5.8% |
| e | 96195 | 5.7% |
| S | 70864 | 4.2% |
| T | 65385 | 3.9% |
| Other values (7) | 236260 |
order_hour
Real number (ℝ)
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.544892 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 1345 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 11 |
| median | 14 |
| Q3 | 18 |
| 95-th percentile | 21 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.0859192 |
|---|---|
| Coefficient of variation (CV) | 0.2809178 |
| Kurtosis | -0.10332688 |
| Mean | 14.544892 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.16435209 |
| Sum | 3434660 |
| Variance | 16.694736 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 27374 | |
| 12 | 24302 | |
| 18 | 21337 | |
| 13 | 18967 | 8.0% |
| 19 | 18260 | 7.7% |
| 17 | 17293 | 7.3% |
| 14 | 16315 | 6.9% |
| 10 | 16203 | 6.9% |
| 15 | 14084 | 6.0% |
| 16 | 14011 | 5.9% |
| Other values (14) | 47996 |
| Value | Count | Frequency (%) |
| 0 | 1345 | 0.6% |
| 1 | 364 | 0.2% |
| 2 | 195 | 0.1% |
| 3 | 159 | 0.1% |
| 4 | 190 | 0.1% |
| 5 | 60 | < 0.1% |
| 6 | 751 | 0.3% |
| 7 | 2147 | 0.9% |
| 8 | 5106 | |
| 9 | 8959 |
| Value | Count | Frequency (%) |
| 23 | 2902 | 1.2% |
| 22 | 5047 | 2.1% |
| 21 | 7983 | 3.4% |
| 20 | 12788 | |
| 19 | 18260 | |
| 18 | 21337 | |
| 17 | 17293 | |
| 16 | 14011 | |
| 15 | 14084 | |
| 14 | 16315 |
delivery_status
Categorical
| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| COMPLETED | |
|---|---|
| CANCELED_BY_USER | 10464 |
| CANCELED_BY_RESTAURANT | 7573 |
| EXPIRED_BY_DRIVER | 1941 |
| FAILED | 246 |
| Other values (13) | 551 |
Length
| Max length | 25 |
|---|---|
| Median length | 9 |
| Mean length | 9.8051935 |
| Min length | 6 |
Characters and Unicode
| Total characters | 2315418 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | COMPLETED |
|---|---|
| 2nd row | COMPLETED |
| 3rd row | COMPLETED |
| 4th row | COMPLETED |
| 5th row | COMPLETED |
Common Values
| Value | Count | Frequency (%) |
| COMPLETED | 215367 | |
| CANCELED_BY_USER | 10464 | 4.4% |
| CANCELED_BY_RESTAURANT | 7573 | 3.2% |
| EXPIRED_BY_DRIVER | 1941 | 0.8% |
| FAILED | 246 | 0.1% |
| CANCELED_BY_DRIVER | 233 | 0.1% |
| CANCELED_BY_CS | 192 | 0.1% |
| DROP_OFF_DONE | 88 | < 0.1% |
| QUOTED | 8 | < 0.1% |
| PICK_UP_FAILED | 8 | < 0.1% |
| Other values (8) | 22 | < 0.1% |
Length
| Value | Count | Frequency (%) |
| completed | 215367 | |
| canceled_by_user | 10464 | 4.4% |
| canceled_by_restaurant | 7573 | 3.2% |
| expired_by_driver | 1941 | 0.8% |
| failed | 246 | 0.1% |
| canceled_by_driver | 233 | 0.1% |
| canceled_by_cs | 192 | 0.1% |
| drop_off_done | 88 | < 0.1% |
| pick_up_failed | 8 | < 0.1% |
| quoted | 8 | < 0.1% |
| Other values (8) | 22 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 492166 | |
| C | 252506 | |
| D | 238420 | |
| L | 234083 | |
| T | 230561 | |
| P | 217416 | |
| O | 215654 | |
| M | 215368 | |
| _ | 41038 | 1.8% |
| A | 33885 | 1.5% |
| Other values (14) | 144321 | 6.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2274380 | |
| Connector Punctuation | 41038 | 1.8% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 492166 | |
| C | 252506 | |
| D | 238420 | |
| L | 234083 | |
| T | 230561 | |
| P | 217416 | |
| O | 215654 | |
| M | 215368 | |
| A | 33885 | 1.5% |
| R | 32049 | 1.4% |
| Other values (13) | 112272 | 4.9% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 41038 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2274380 | |
| Common | 41038 | 1.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 492166 | |
| C | 252506 | |
| D | 238420 | |
| L | 234083 | |
| T | 230561 | |
| P | 217416 | |
| O | 215654 | |
| M | 215368 | |
| A | 33885 | 1.5% |
| R | 32049 | 1.4% |
| Other values (13) | 112272 | 4.9% |
Common
| Value | Count | Frequency (%) |
| _ | 41038 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2315418 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| E | 492166 | |
| C | 252506 | |
| D | 238420 | |
| L | 234083 | |
| T | 230561 | |
| P | 217416 | |
| O | 215654 | |
| M | 215368 | |
| _ | 41038 | 1.8% |
| A | 33885 | 1.5% |
| Other values (14) | 144321 | 6.2% |
payment_method
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| CASH | |
|---|---|
| LINEMAN_CREDIT_CARD | |
| RLP |
Length
| Max length | 19 |
|---|---|
| Median length | 4 |
| Mean length | 5.3817534 |
| Min length | 3 |
Characters and Unicode
| Total characters | 1270858 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CASH |
|---|---|
| 2nd row | CASH |
| 3rd row | CASH |
| 4th row | CASH |
| 5th row | CASH |
Common Values
| Value | Count | Frequency (%) |
| CASH | 191968 | |
| LINEMAN_CREDIT_CARD | 23154 | 9.8% |
| RLP | 21020 | 8.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| cash | 191968 | |
| lineman_credit_card | 23154 | 9.8% |
| rlp | 21020 | 8.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 238276 | |
| A | 238276 | |
| S | 191968 | |
| H | 191968 | |
| R | 67328 | 5.3% |
| I | 46308 | 3.6% |
| N | 46308 | 3.6% |
| E | 46308 | 3.6% |
| _ | 46308 | 3.6% |
| D | 46308 | 3.6% |
| Other values (4) | 111502 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1224550 | |
| Connector Punctuation | 46308 | 3.6% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 238276 | |
| A | 238276 | |
| S | 191968 | |
| H | 191968 | |
| R | 67328 | 5.5% |
| I | 46308 | 3.8% |
| N | 46308 | 3.8% |
| E | 46308 | 3.8% |
| D | 46308 | 3.8% |
| L | 44174 | 3.6% |
| Other values (3) | 67328 | 5.5% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 46308 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1224550 | |
| Common | 46308 | 3.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 238276 | |
| A | 238276 | |
| S | 191968 | |
| H | 191968 | |
| R | 67328 | 5.5% |
| I | 46308 | 3.8% |
| N | 46308 | 3.8% |
| E | 46308 | 3.8% |
| D | 46308 | 3.8% |
| L | 44174 | 3.6% |
| Other values (3) | 67328 | 5.5% |
Common
| Value | Count | Frequency (%) |
| _ | 46308 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1270858 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 238276 | |
| A | 238276 | |
| S | 191968 | |
| H | 191968 | |
| R | 67328 | 5.3% |
| I | 46308 | 3.6% |
| N | 46308 | 3.6% |
| E | 46308 | 3.6% |
| _ | 46308 | 3.6% |
| D | 46308 | 3.6% |
| Other values (4) | 111502 |
coupon_usage
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| COUPON USED | |
|---|---|
| NO COUPON |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 10.63909 |
| Min length | 9 |
Characters and Unicode
| Total characters | 2512336 |
|---|---|
| Distinct characters | 9 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | NO COUPON |
|---|---|
| 2nd row | COUPON USED |
| 3rd row | NO COUPON |
| 4th row | NO COUPON |
| 5th row | NO COUPON |
Common Values
| Value | Count | Frequency (%) |
| COUPON USED | 193529 | |
| NO COUPON | 42613 | 18.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| coupon | 236142 | |
| used | 193529 | |
| no | 42613 | 9.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| O | 514897 | |
| U | 429671 | |
| N | 278755 | |
| C | 236142 | |
| P | 236142 | |
| 236142 | ||
| S | 193529 | 7.7% |
| E | 193529 | 7.7% |
| D | 193529 | 7.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2276194 | |
| Space Separator | 236142 | 9.4% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| O | 514897 | |
| U | 429671 | |
| N | 278755 | |
| C | 236142 | |
| P | 236142 | |
| S | 193529 | 8.5% |
| E | 193529 | 8.5% |
| D | 193529 | 8.5% |
Space Separator
| Value | Count | Frequency (%) |
| 236142 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2276194 | |
| Common | 236142 | 9.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| O | 514897 | |
| U | 429671 | |
| N | 278755 | |
| C | 236142 | |
| P | 236142 | |
| S | 193529 | 8.5% |
| E | 193529 | 8.5% |
| D | 193529 | 8.5% |
Common
| Value | Count | Frequency (%) |
| 236142 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2512336 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| O | 514897 | |
| U | 429671 | |
| N | 278755 | |
| C | 236142 | |
| P | 236142 | |
| 236142 | ||
| S | 193529 | 7.7% |
| E | 193529 | 7.7% |
| D | 193529 | 7.7% |
| Distinct | 385 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.205973 |
| Minimum | 0 |
|---|---|
| Maximum | 1182 |
| Zeros | 91802 |
| Zeros (%) | 38.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 10 |
| Q3 | 15 |
| 95-th percentile | 73 |
| Maximum | 1182 |
| Range | 1182 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 30.912053 |
|---|---|
| Coefficient of variation (CV) | 1.9074482 |
| Kurtosis | 48.825425 |
| Mean | 16.205973 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 4.8844479 |
| Sum | 3826910.9 |
| Variance | 955.55505 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 91802 | |
| 10 | 45572 | |
| 15 | 20556 | 8.7% |
| 5 | 19845 | 8.4% |
| 20 | 16067 | 6.8% |
| 29 | 4262 | 1.8% |
| 25 | 3927 | 1.7% |
| 40 | 2885 | 1.2% |
| 50 | 2800 | 1.2% |
| 62 | 2538 | 1.1% |
| Other values (375) | 25888 | 11.0% |
| Value | Count | Frequency (%) |
| 0 | 91802 | |
| 1 | 1010 | 0.4% |
| 2 | 299 | 0.1% |
| 3 | 22 | < 0.1% |
| 4 | 19 | < 0.1% |
| 5 | 19845 | 8.4% |
| 6 | 323 | 0.1% |
| 7 | 47 | < 0.1% |
| 8 | 119 | 0.1% |
| 9 | 1234 | 0.5% |
| Value | Count | Frequency (%) |
| 1182 | 1 | < 0.1% |
| 697 | 5 | |
| 691 | 2 | < 0.1% |
| 689 | 1 | < 0.1% |
| 681 | 9 | |
| 596 | 1 | < 0.1% |
| 585 | 1 | < 0.1% |
| 575 | 1 | < 0.1% |
| 573 | 1 | < 0.1% |
| 571 | 2 | < 0.1% |
food_price
Real number (ℝ)
| Distinct | 1814 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 196.77864 |
| Minimum | 0 |
|---|---|
| Maximum | 7001 |
| Zeros | 6378 |
| Zeros (%) | 2.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 80 |
| median | 138 |
| Q3 | 230 |
| 95-th percentile | 566 |
| Maximum | 7001 |
| Range | 7001 |
| Interquartile range (IQR) | 150 |
Descriptive statistics
| Standard deviation | 206.84037 |
|---|---|
| Coefficient of variation (CV) | 1.0511322 |
| Kurtosis | 26.920898 |
| Mean | 196.77864 |
| Median Absolute Deviation (MAD) | 65 |
| Skewness | 3.8469509 |
| Sum | 46467701 |
| Variance | 42782.937 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 60 | 8642 | 3.7% |
| 120 | 7056 | 3.0% |
| 100 | 6986 | 3.0% |
| 0 | 6378 | 2.7% |
| 50 | 6357 | 2.7% |
| 55 | 5532 | 2.3% |
| 65 | 5321 | 2.3% |
| 70 | 5306 | 2.2% |
| 110 | 5184 | 2.2% |
| 150 | 4836 | 2.0% |
| Other values (1804) | 174544 |
| Value | Count | Frequency (%) |
| 0 | 6378 | |
| 1 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 10 | 5 | < 0.1% |
| 13 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 15 | 4 | < 0.1% |
| 18.69 | 2 | < 0.1% |
| 20 | 16 | < 0.1% |
| 24 | 31 | < 0.1% |
| Value | Count | Frequency (%) |
| 7001 | 1 | |
| 5196 | 1 | |
| 4577 | 1 | |
| 4165 | 1 | |
| 3560 | 1 | |
| 2997 | 1 | |
| 2978 | 1 | |
| 2960 | 1 | |
| 2885 | 2 | |
| 2837 | 1 |
| Distinct | 3366 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 25966 |
| Missing (%) | 11.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 782.58559 |
| Minimum | 0 |
|---|---|
| Maximum | 156541 |
| Zeros | 3 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 258 |
| Q1 | 468 |
| median | 681 |
| Q3 | 980 |
| 95-th percentile | 1640 |
| Maximum | 156541 |
| Range | 156541 |
| Interquartile range (IQR) | 512 |
Descriptive statistics
| Standard deviation | 612.56911 |
|---|---|
| Coefficient of variation (CV) | 0.78275031 |
| Kurtosis | 21350.077 |
| Mean | 782.58559 |
| Median Absolute Deviation (MAD) | 244 |
| Skewness | 92.336227 |
| Sum | 1.6448071 × 108 |
| Variance | 375240.91 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 461 | 295 | 0.1% |
| 541 | 289 | 0.1% |
| 526 | 282 | 0.1% |
| 556 | 281 | 0.1% |
| 518 | 280 | 0.1% |
| 546 | 279 | 0.1% |
| 538 | 279 | 0.1% |
| 613 | 278 | 0.1% |
| 494 | 277 | 0.1% |
| 577 | 276 | 0.1% |
| Other values (3356) | 207360 | |
| (Missing) | 25966 | 11.0% |
| Value | Count | Frequency (%) |
| 0 | 3 | |
| 1 | 2 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 1 | < 0.1% |
| 5 | 3 | |
| 6 | 4 | |
| 7 | 4 | |
| 8 | 3 | |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 156541 | 1 | |
| 76956 | 1 | |
| 53582 | 1 | |
| 36755 | 1 | |
| 15809 | 1 | |
| 14321 | 1 | |
| 12204 | 1 | |
| 8979 | 1 | |
| 8168 | 1 | |
| 7199 | 1 |
| Distinct | 2964 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 20804 |
| Missing (%) | 8.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 541.57356 |
| Minimum | 0 |
|---|---|
| Maximum | 101097 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 137 |
| Q1 | 317 |
| median | 468 |
| Q3 | 671 |
| 95-th percentile | 1161 |
| Maximum | 101097 |
| Range | 101097 |
| Interquartile range (IQR) | 354 |
Descriptive statistics
| Standard deviation | 496.37308 |
|---|---|
| Coefficient of variation (CV) | 0.9165386 |
| Kurtosis | 10277.512 |
| Mean | 541.57356 |
| Median Absolute Deviation (MAD) | 171 |
| Skewness | 64.582456 |
| Sum | 1.1662137 × 108 |
| Variance | 246386.23 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 667 | 0.3% |
| 4 | 483 | 0.2% |
| 2 | 459 | 0.2% |
| 403 | 416 | 0.2% |
| 375 | 407 | 0.2% |
| 353 | 402 | 0.2% |
| 347 | 399 | 0.2% |
| 385 | 397 | 0.2% |
| 335 | 394 | 0.2% |
| 369 | 392 | 0.2% |
| Other values (2954) | 210922 | |
| (Missing) | 20804 | 8.8% |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 26 | < 0.1% |
| 2 | 459 | |
| 3 | 667 | |
| 4 | 483 | |
| 5 | 272 | |
| 6 | 201 | 0.1% |
| 7 | 142 | 0.1% |
| 8 | 135 | 0.1% |
| 9 | 126 | 0.1% |
| Value | Count | Frequency (%) |
| 101097 | 1 | |
| 55967 | 1 | |
| 53985 | 1 | |
| 50289 | 1 | |
| 45512 | 1 | |
| 44363 | 1 | |
| 23329 | 1 | |
| 19957 | 1 | |
| 12787 | 1 | |
| 12584 | 1 |
restaurant_id
Real number (ℝ)
| Distinct | 200 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98.985221 |
| Minimum | 1 |
|---|---|
| Maximum | 200 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 55 |
| median | 104 |
| Q3 | 142 |
| 95-th percentile | 179 |
| Maximum | 200 |
| Range | 199 |
| Interquartile range (IQR) | 87 |
Descriptive statistics
| Standard deviation | 52.467492 |
|---|---|
| Coefficient of variation (CV) | 0.5300538 |
| Kurtosis | -1.0562857 |
| Mean | 98.985221 |
| Median Absolute Deviation (MAD) | 41 |
| Skewness | -0.12205706 |
| Sum | 23374568 |
| Variance | 2752.8378 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 158 | 4883 | 2.1% |
| 129 | 4879 | 2.1% |
| 75 | 4566 | 1.9% |
| 39 | 4547 | 1.9% |
| 123 | 4388 | 1.9% |
| 152 | 4338 | 1.8% |
| 174 | 4164 | 1.8% |
| 122 | 4021 | 1.7% |
| 55 | 3999 | 1.7% |
| 86 | 3966 | 1.7% |
| Other values (190) | 192391 |
| Value | Count | Frequency (%) |
| 1 | 617 | 0.3% |
| 2 | 320 | 0.1% |
| 3 | 1662 | |
| 4 | 1646 | |
| 5 | 2370 | |
| 6 | 854 | 0.4% |
| 7 | 993 | |
| 8 | 884 | 0.4% |
| 9 | 299 | 0.1% |
| 10 | 372 | 0.2% |
| Value | Count | Frequency (%) |
| 200 | 219 | 0.1% |
| 199 | 897 | |
| 198 | 321 | 0.1% |
| 197 | 220 | 0.1% |
| 196 | 187 | 0.1% |
| 195 | 216 | 0.1% |
| 194 | 354 | 0.1% |
| 193 | 480 | |
| 192 | 194 | 0.1% |
| 191 | 839 |
restaurant_category
Categorical
| Distinct | 33 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| À La Carte | |
|---|---|
| Rice Dish | |
| North East | |
| Café/Coffee Shop | |
| Noodles | |
| Other values (28) |
Length
| Max length | 28 |
|---|---|
| Median length | 20 |
| Mean length | 10.579338 |
| Min length | 4 |
Characters and Unicode
| Total characters | 2498226 |
|---|---|
| Distinct characters | 46 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Rice Dish |
|---|---|
| 2nd row | Rice Dish |
| 3rd row | Delivery Only |
| 4th row | Street Food/Food Stands |
| 5th row | Rice Dish |
Common Values
| Value | Count | Frequency (%) |
| À La Carte | 26977 | |
| Rice Dish | 24427 | 10.3% |
| North East | 22397 | 9.5% |
| Café/Coffee Shop | 20377 | 8.6% |
| Noodles | 20143 | 8.5% |
| Thai | 18977 | 8.0% |
| Bakery/Cake | 11532 | 4.9% |
| Dessert | 10741 | 4.5% |
| Bubble Milk Tea | 8713 | 3.7% |
| Delivery Only | 8409 | 3.6% |
| Other values (23) | 63449 |
Length
| Value | Count | Frequency (%) |
| rice | 30173 | 7.1% |
| à | 26977 | 6.3% |
| la | 26977 | 6.3% |
| carte | 26977 | 6.3% |
| dish | 24427 | 5.7% |
| north | 22397 | 5.3% |
| east | 22397 | 5.3% |
| café/coffee | 20377 | 4.8% |
| shop | 20377 | 4.8% |
| noodles | 20143 | 4.7% |
| Other values (40) | 185085 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 279377 | 11.2% |
| a | 205745 | 8.2% |
| 190165 | 7.6% | |
| o | 178387 | 7.1% |
| i | 138156 | 5.5% |
| t | 124111 | 5.0% |
| r | 118582 | 4.7% |
| s | 112057 | 4.5% |
| h | 109815 | 4.4% |
| C | 87113 | 3.5% |
| Other values (36) | 954718 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1772415 | |
| Uppercase Letter | 479264 | 19.2% |
| Space Separator | 190165 | 7.6% |
| Other Punctuation | 56382 | 2.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 279377 | |
| a | 205745 | |
| o | 178387 | |
| i | 138156 | 7.8% |
| t | 124111 | 7.0% |
| r | 118582 | 6.7% |
| s | 112057 | 6.3% |
| h | 109815 | 6.2% |
| f | 82106 | 4.6% |
| l | 60219 | 3.4% |
| Other values (14) | 363860 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 87113 | |
| S | 53705 | |
| N | 52307 | |
| D | 43577 | |
| L | 32723 | 6.8% |
| T | 31115 | 6.5% |
| R | 30173 | 6.3% |
| B | 29950 | 6.2% |
| À | 26977 | 5.6% |
| E | 22397 | 4.7% |
| Other values (10) | 69227 |
Space Separator
| Value | Count | Frequency (%) |
| 190165 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 56382 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2251679 | |
| Common | 246547 | 9.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 279377 | 12.4% |
| a | 205745 | 9.1% |
| o | 178387 | 7.9% |
| i | 138156 | 6.1% |
| t | 124111 | 5.5% |
| r | 118582 | 5.3% |
| s | 112057 | 5.0% |
| h | 109815 | 4.9% |
| C | 87113 | 3.9% |
| f | 82106 | 3.6% |
| Other values (34) | 816230 |
Common
| Value | Count | Frequency (%) |
| 190165 | ||
| / | 56382 | 22.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2450872 | |
| None | 47354 | 1.9% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 279377 | 11.4% |
| a | 205745 | 8.4% |
| 190165 | 7.8% | |
| o | 178387 | 7.3% |
| i | 138156 | 5.6% |
| t | 124111 | 5.1% |
| r | 118582 | 4.8% |
| s | 112057 | 4.6% |
| h | 109815 | 4.5% |
| C | 87113 | 3.6% |
| Other values (34) | 907364 |
None
| Value | Count | Frequency (%) |
| À | 26977 | |
| é | 20377 |
restaurant_type
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| NON-CHAIN | |
|---|---|
| CHAIN_RESTAURANT |
Length
| Max length | 16 |
|---|---|
| Median length | 9 |
| Mean length | 10.246703 |
| Min length | 9 |
Characters and Unicode
| Total characters | 2419677 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | NON-CHAIN |
|---|---|
| 2nd row | CHAIN_RESTAURANT |
| 3rd row | NON-CHAIN |
| 4th row | NON-CHAIN |
| 5th row | NON-CHAIN |
Common Values
| Value | Count | Frequency (%) |
| NON-CHAIN | 194085 | |
| CHAIN_RESTAURANT | 42057 | 17.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| non-chain | 194085 | |
| chain_restaurant | 42057 | 17.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| N | 666369 | |
| A | 320256 | |
| C | 236142 | 9.8% |
| H | 236142 | 9.8% |
| I | 236142 | 9.8% |
| O | 194085 | 8.0% |
| - | 194085 | 8.0% |
| R | 84114 | 3.5% |
| T | 84114 | 3.5% |
| _ | 42057 | 1.7% |
| Other values (3) | 126171 | 5.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2183535 | |
| Dash Punctuation | 194085 | 8.0% |
| Connector Punctuation | 42057 | 1.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 666369 | |
| A | 320256 | |
| C | 236142 | 10.8% |
| H | 236142 | 10.8% |
| I | 236142 | 10.8% |
| O | 194085 | 8.9% |
| R | 84114 | 3.9% |
| T | 84114 | 3.9% |
| E | 42057 | 1.9% |
| S | 42057 | 1.9% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 194085 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 42057 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2183535 | |
| Common | 236142 | 9.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| N | 666369 | |
| A | 320256 | |
| C | 236142 | 10.8% |
| H | 236142 | 10.8% |
| I | 236142 | 10.8% |
| O | 194085 | 8.9% |
| R | 84114 | 3.9% |
| T | 84114 | 3.9% |
| E | 42057 | 1.9% |
| S | 42057 | 1.9% |
Common
| Value | Count | Frequency (%) |
| - | 194085 | |
| _ | 42057 | 17.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2419677 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| N | 666369 | |
| A | 320256 | |
| C | 236142 | 9.8% |
| H | 236142 | 9.8% |
| I | 236142 | 9.8% |
| O | 194085 | 8.0% |
| - | 194085 | 8.0% |
| R | 84114 | 3.5% |
| T | 84114 | 3.5% |
| _ | 42057 | 1.7% |
| Other values (3) | 126171 | 5.2% |
province
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| Bangkok | |
|---|---|
| Pathum Thani | |
| Samut Prakan | |
| Nonthaburi | |
| Samut Sakhon |
Length
| Max length | 13 |
|---|---|
| Median length | 7 |
| Mean length | 9.100952 |
| Min length | 7 |
Characters and Unicode
| Total characters | 2149117 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Nakhon Pathom |
|---|---|
| 2nd row | Bangkok |
| 3rd row | Bangkok |
| 4th row | Samut Sakhon |
| 5th row | Nakhon Pathom |
Common Values
| Value | Count | Frequency (%) |
| Bangkok | 126936 | |
| Pathum Thani | 30071 | 12.7% |
| Samut Prakan | 28624 | 12.1% |
| Nonthaburi | 28482 | 12.1% |
| Samut Sakhon | 14972 | 6.3% |
| Nakhon Pathom | 7057 | 3.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| bangkok | 126936 | |
| samut | 43596 | 13.8% |
| pathum | 30071 | 9.5% |
| thani | 30071 | 9.5% |
| prakan | 28624 | 9.0% |
| nonthaburi | 28482 | 9.0% |
| sakhon | 14972 | 4.7% |
| nakhon | 7057 | 2.2% |
| pathom | 7057 | 2.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 345490 | |
| k | 304525 | |
| n | 236142 | |
| o | 184504 | |
| B | 126936 | 5.9% |
| g | 126936 | 5.9% |
| h | 117710 | 5.5% |
| t | 109206 | 5.1% |
| u | 102149 | 4.8% |
| m | 80724 | 3.8% |
| Other values (8) | 414795 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1751527 | |
| Uppercase Letter | 316866 | 14.7% |
| Space Separator | 80724 | 3.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 345490 | |
| k | 304525 | |
| n | 236142 | |
| o | 184504 | |
| g | 126936 | 7.2% |
| h | 117710 | 6.7% |
| t | 109206 | 6.2% |
| u | 102149 | 5.8% |
| m | 80724 | 4.6% |
| i | 58553 | 3.3% |
| Other values (2) | 85588 | 4.9% |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 126936 | |
| P | 65752 | |
| S | 58568 | |
| N | 35539 | 11.2% |
| T | 30071 | 9.5% |
Space Separator
| Value | Count | Frequency (%) |
| 80724 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2068393 | |
| Common | 80724 | 3.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 345490 | |
| k | 304525 | |
| n | 236142 | |
| o | 184504 | |
| B | 126936 | 6.1% |
| g | 126936 | 6.1% |
| h | 117710 | 5.7% |
| t | 109206 | 5.3% |
| u | 102149 | 4.9% |
| m | 80724 | 3.9% |
| Other values (7) | 334071 |
Common
| Value | Count | Frequency (%) |
| 80724 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2149117 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 345490 | |
| k | 304525 | |
| n | 236142 | |
| o | 184504 | |
| B | 126936 | 5.9% |
| g | 126936 | 5.9% |
| h | 117710 | 5.5% |
| t | 109206 | 5.1% |
| u | 102149 | 4.8% |
| m | 80724 | 3.8% |
| Other values (8) | 414795 |
| Distinct | 111 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5805 |
| Missing (%) | 2.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 85.563557 |
| Minimum | 1 |
|---|---|
| Maximum | 969 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 34 |
| median | 61 |
| Q3 | 106 |
| 95-th percentile | 301 |
| Maximum | 969 |
| Range | 968 |
| Interquartile range (IQR) | 72 |
Descriptive statistics
| Standard deviation | 82.752999 |
|---|---|
| Coefficient of variation (CV) | 0.9671524 |
| Kurtosis | 18.17982 |
| Mean | 85.563557 |
| Median Absolute Deviation (MAD) | 36 |
| Skewness | 2.8469687 |
| Sum | 19708453 |
| Variance | 6848.0588 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 61 | 9629 | 4.1% |
| 41 | 7434 | 3.1% |
| 118 | 7413 | 3.1% |
| 65 | 6984 | 3.0% |
| 21 | 6755 | 2.9% |
| 58 | 6281 | 2.7% |
| 46 | 5636 | 2.4% |
| 63 | 5092 | 2.2% |
| 38 | 4868 | 2.1% |
| 25 | 4666 | 2.0% |
| Other values (101) | 165579 | |
| (Missing) | 5805 | 2.5% |
| Value | Count | Frequency (%) |
| 1 | 1524 | 0.6% |
| 2 | 210 | 0.1% |
| 3 | 567 | 0.2% |
| 4 | 1216 | 0.5% |
| 5 | 4391 | |
| 6 | 1313 | 0.6% |
| 7 | 1534 | 0.6% |
| 8 | 1858 | |
| 9 | 564 | 0.2% |
| 10 | 1228 | 0.5% |
| Value | Count | Frequency (%) |
| 969 | 313 | 0.1% |
| 316 | 4566 | |
| 310 | 1989 | |
| 308 | 4164 | |
| 301 | 1180 | 0.5% |
| 281 | 365 | 0.2% |
| 258 | 2391 | |
| 232 | 2018 | |
| 219 | 652 | 0.3% |
| 208 | 1646 | 0.7% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| order_date | order_day_of_week | order_hour | delivery_status | payment_method | coupon_usage | delivery_fee | food_price | cooking_time_in_seconds | delivery_time_in_seconds | restaurant_id | restaurant_category | restaurant_type | province | nb_menu_items | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Monday | 12 | COMPLETED | CASH | NO COUPON | 10.0 | 100.0 | 424.0 | 691.0 | 63 | Rice Dish | NON-CHAIN | Nakhon Pathom | 14.0 |
| 1 | 1 | Monday | 11 | COMPLETED | CASH | COUPON USED | 75.0 | 0.0 | NaN | 285.0 | 70 | Rice Dish | CHAIN_RESTAURANT | Bangkok | 66.0 |
| 2 | 1 | Monday | 9 | COMPLETED | CASH | NO COUPON | 0.0 | 130.0 | 345.0 | 602.0 | 106 | Delivery Only | NON-CHAIN | Bangkok | 15.0 |
| 3 | 1 | Monday | 18 | COMPLETED | CASH | NO COUPON | 20.0 | 80.0 | 710.0 | 367.0 | 27 | Street Food/Food Stands | NON-CHAIN | Samut Sakhon | 14.0 |
| 4 | 1 | Monday | 8 | COMPLETED | CASH | NO COUPON | 10.0 | 60.0 | 659.0 | 717.0 | 63 | Rice Dish | NON-CHAIN | Nakhon Pathom | 14.0 |
| 5 | 1 | Monday | 18 | COMPLETED | CASH | NO COUPON | 20.0 | 414.0 | 724.0 | 1222.0 | 33 | Chinese | NON-CHAIN | Bangkok | 23.0 |
| 6 | 1 | Monday | 9 | COMPLETED | CASH | NO COUPON | 0.0 | 60.0 | 950.0 | 3.0 | 27 | Street Food/Food Stands | NON-CHAIN | Samut Sakhon | 14.0 |
| 7 | 1 | Monday | 11 | COMPLETED | CASH | NO COUPON | 0.0 | 160.0 | 560.0 | 984.0 | 63 | Rice Dish | NON-CHAIN | Nakhon Pathom | 14.0 |
| 8 | 1 | Monday | 11 | COMPLETED | CASH | NO COUPON | 0.0 | 225.0 | 491.0 | 976.0 | 27 | Street Food/Food Stands | NON-CHAIN | Samut Sakhon | 14.0 |
| 9 | 1 | Monday | 14 | COMPLETED | RLP | NO COUPON | 10.0 | 100.0 | 1117.0 | 412.0 | 123 | Rice Dish | NON-CHAIN | Pathum Thani | 21.0 |
| order_date | order_day_of_week | order_hour | delivery_status | payment_method | coupon_usage | delivery_fee | food_price | cooking_time_in_seconds | delivery_time_in_seconds | restaurant_id | restaurant_category | restaurant_type | province | nb_menu_items | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 236132 | 365 | Monday | 17 | COMPLETED | CASH | COUPON USED | 94.0 | 860.0 | 1163.0 | 1412.0 | 26 | Thai | NON-CHAIN | Bangkok | 179.0 |
| 236133 | 365 | Monday | 11 | CANCELED_BY_USER | CASH | COUPON USED | 81.0 | 1100.0 | NaN | NaN | 26 | Thai | NON-CHAIN | Bangkok | 179.0 |
| 236134 | 365 | Monday | 13 | COMPLETED | CASH | COUPON USED | 0.0 | 50.0 | 346.0 | 335.0 | 85 | Street Food/Food Stands | NON-CHAIN | Bangkok | 8.0 |
| 236135 | 365 | Monday | 22 | COMPLETED | CASH | COUPON USED | 0.0 | 55.0 | 523.0 | 276.0 | 183 | Thai | NON-CHAIN | Pathum Thani | 23.0 |
| 236136 | 365 | Monday | 14 | COMPLETED | CASH | COUPON USED | 0.0 | 80.0 | 414.0 | 248.0 | 98 | Bubble Milk Tea | NON-CHAIN | Bangkok | 25.0 |
| 236137 | 365 | Monday | 15 | COMPLETED | CASH | COUPON USED | 137.0 | 960.0 | 1464.0 | 1403.0 | 26 | Thai | NON-CHAIN | Bangkok | 179.0 |
| 236138 | 365 | Monday | 14 | COMPLETED | CASH | COUPON USED | 0.0 | 50.0 | 207.0 | 401.0 | 85 | Street Food/Food Stands | NON-CHAIN | Bangkok | 8.0 |
| 236139 | 365 | Monday | 17 | COMPLETED | CASH | COUPON USED | 25.0 | 326.0 | 926.0 | 658.0 | 40 | Steak House/Barbeque | NON-CHAIN | Nonthaburi | 119.0 |
| 236140 | 365 | Monday | 17 | COMPLETED | CASH | COUPON USED | 0.0 | 240.0 | 709.0 | 497.0 | 185 | North East | NON-CHAIN | Bangkok | 41.0 |
| 236141 | 365 | Monday | 11 | COMPLETED | CASH | COUPON USED | 0.0 | 50.0 | 287.0 | 620.0 | 85 | Street Food/Food Stands | NON-CHAIN | Bangkok | 8.0 |
Most frequently occurring
| order_date | order_day_of_week | order_hour | delivery_status | payment_method | coupon_usage | delivery_fee | food_price | cooking_time_in_seconds | delivery_time_in_seconds | restaurant_id | restaurant_category | restaurant_type | province | nb_menu_items | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 397 | 115 | Wednesday | 16 | EXPIRED_BY_DRIVER | CASH | COUPON USED | 0.0 | 99.0 | NaN | NaN | 12 | Bakery/Cake | CHAIN_RESTAURANT | Bangkok | 21.0 | 20 |
| 914 | 197 | Monday | 20 | CANCELED_BY_USER | CASH | COUPON USED | 107.0 | 0.0 | NaN | NaN | 17 | Moo Kata | NON-CHAIN | Bangkok | 25.0 | 17 |
| 390 | 115 | Wednesday | 16 | CANCELED_BY_USER | CASH | COUPON USED | 0.0 | 99.0 | NaN | NaN | 12 | Bakery/Cake | CHAIN_RESTAURANT | Bangkok | 21.0 | 10 |
| 391 | 115 | Wednesday | 16 | CANCELED_BY_USER | CASH | COUPON USED | 10.0 | 99.0 | NaN | NaN | 12 | Bakery/Cake | CHAIN_RESTAURANT | Bangkok | 21.0 | 9 |
| 448 | 119 | Sunday | 23 | CANCELED_BY_USER | CASH | COUPON USED | 10.0 | 70.0 | NaN | NaN | 79 | Rice Dish | NON-CHAIN | Samut Prakan | 160.0 | 9 |
| 399 | 115 | Wednesday | 16 | EXPIRED_BY_DRIVER | CASH | COUPON USED | 10.0 | 99.0 | NaN | NaN | 12 | Bakery/Cake | CHAIN_RESTAURANT | Bangkok | 21.0 | 8 |
| 977 | 215 | Friday | 10 | CANCELED_BY_USER | CASH | COUPON USED | 35.0 | 145.0 | NaN | NaN | 91 | Café/Coffee Shop | CHAIN_RESTAURANT | Bangkok | 81.0 | 8 |
| 501 | 124 | Friday | 19 | CANCELED_BY_USER | CASH | COUPON USED | 10.0 | 55.0 | NaN | NaN | 70 | Rice Dish | CHAIN_RESTAURANT | Bangkok | 66.0 | 7 |
| 1202 | 295 | Monday | 11 | CANCELED_BY_USER | CASH | COUPON USED | 29.0 | 350.0 | NaN | NaN | 149 | Bakery/Cake | NON-CHAIN | Bangkok | 42.0 | 7 |
| 245 | 75 | Friday | 19 | CANCELED_BY_USER | CASH | NO COUPON | 50.0 | 110.0 | NaN | NaN | 122 | Northern Food | NON-CHAIN | Bangkok | 38.0 | 6 |